TTP: A Fast And Robust Parser For Natural Language

نویسنده

  • Tomek Strzalkowski
چکیده

In this paper we describe TI~ , a fast and robust natural language parser which can analyze written text and generate regularized parse structures for sentences and phrases at the speed of approximately 0.5 sec/sentence, or 44 word per second. The parser is based on a wide coverage grammar for English, developed by the New York University's Linguistic String Project, and it uses the machine-readable version of the Oxford Advanced lw~arner's Dictionary as a source of its basic vocabulary. The parser operates on stochastically tagged text, and contains a powerful skip-and-fit recovery mechanism that allows it to deal with extra-grammatical input and to operate effectively under a severe time pressure. Empirical experiments, testing parser's speed and accuracy, were performed on several collections: a collection of technical abstracts (CACM-3204), a corpus of news messages (MUC-3), a selection from ACM Computer Library database, and a collection of Wall Street Journal articles, approximately 50 million words in total.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

A Broad-coverage, Representationally Minimal Lfg Parser: Chunks and F-structures Are Sufficient

Amajor reason why LFG employs c-structure is because it is context-free. According to Tree-Adjoining Grammar (TAG), the only context-sensitive operation that is needed to express natural language is Adjoining, from which LFG functional uncertainty has been shown to follow. Functional uncertainty, which is expressed on the level of f-structure, would then be the only extension needed to an other...

متن کامل

A Robust And Hybrid Deep-Linguistic Theory Applied To Large-Scale Parsing

Modern statistical parsers are robust and quite fast, but their output is relatively shallow when compared to formal grammar parsers. We suggest to extend statistical approaches to a more deep-linguistic analysis while at the same time keeping the speed and low complexity of a statistical parser. The resulting parsing architecture suggested, implemented and evaluated here is highly robust and h...

متن کامل

Using An Incremental Robust Parser To Automatically Generate Semantic UNL Graphs

The UNL project (Universal Networking Language) proposes a standard for encoding the meaning of natural language utterances as semantic hypergraphs, intended to be used as pivot in multilingual information and communication systems. Several deconverters permit to automatically translate UNL utterances into natural languages. However, a rough enconvertion from natural language texts to UNL expre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992